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Oj 

O | We revisit known transformations from Jinja bytecode to rewrite systems 

from the viewpoint of runtime complexity. Suitably generalising the con- 
structions proposed in the literature, we define an alternative representation 
of Jinja bytecode (JBC) executions as computation graphs from which we 
\Q • obtain a representation of JBC executions as constrained rewrite systems. 

. We prove non-termination and complexity preservation of the transforma- 

tion. More precisely the runtime complexity of a given JBC program and 
the runtime complexity of the resulting rewrite system are related by a linear 



£\j . factor. 

1. Introduction 



In recent years research on complexity of rewrite systems has matured and a number 
of noteworthy results could be established. We give a quantitative assessment based on 
the annual competition of complexity analysers within TERMCOMpQ With respect to 
last year's run of TERMCOMP, we see a success rate of 38 % in the category Runtime 
Complexity - Innermost Rewriting. Note that the corresponding testbed is not restricted 
to polynomial runtime complexity in any way. With respect to a qualitative assessment 
we want to mention the very recent efforts to apply methods from linear algebra and 
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automata theory to complexity 22h recent efforts on adaption of the dependency pair 
method to complexity 15 . Tol . 3, 17] and the ongoing quest to incorporate composition- 



ality 0,0]. (See Q for an overview in methods of complexity analysis of term rewrite 
systems.) 

We are concerned with the question of applicability of these results to establish auto- 
mated runtime complexity analysis of imperative programs, in particular of Jinja byte- 
code (JBC) programs. Jinja is a Java-like language that exhibits the core features of 
Java [33]. Its semantics is clearly defined and machine checked in the theorem prover 
Isabelle/HOL 

Our work is motivated by recent efforts to establish a non-termination preserving 
transformation from JBC programs to integer term rewrite system (ITRSs) 27,0, 0,0]- 
Building upon 32, 2^| this method makes use of termination graphs. Termination graphs 
essentially are control-flow graphs using terms as abstraction domain, such that any eval- 
uation is abstracted to a path in the graph. Then the termination graph becomes repre- 
sentable as an ITRS. As existing termination techniques can be adapted to ITRSs with 
relative ease, the termination graph method yields a competitive termination analysis of 
JBC programs. 

Lifting this technique to an (automatable) complexity analysis is challenging. First, 
we have to establish that the transformation is sound from the viewpoint of complexity, 
ie., we have to establish complexity preservation: the runtime complexity of a given JBC 
program P is a (polynomial) function of the runtime complexity of a ITRS 1Z. Second, 
we have to adapt existing methods of complexity analysis to ITRSs. Thirdly, in order to 
make the analysis competitive, we have to provide compos ability of the obtained analysis, 
as is present in existing complexity tools for imperative programs 13|, 1^, EI EL EE EB] • 

In this paper we are mainly concerned with the first challenge. We will come back to 
the two remaining challenges in the conclusions. Suitably generalising the constructions 
proposed in the literature, we define an alternative representation of JBC executions 
as computation graphs from which we obtain a representation of JBC executions as 
constrained term rewrite systems (cTRSs for short). CTRSs form a special type of 
rewrite systems that allow the formulation of conditions C over a theory T, such that a 
rule can only be used if the condition C is satisfied in T. Constraints are used to express 
relations on program variables. In our analysis we restrict to well-formed JBC programs 
that only make use of non-recursive methods and expect tree-shaped objects as input. 
Our main novel contributions are to extend results in the literature: 

1. by basing our analysis directly on Jinja bytecode without relying on a boxed heap- 
model, 

2. by providing a new graph-based representation of abstractions of JVM states that 
gives rise to a simplification and precision of widening of states, 

3. by showing that any JBC program P subject to our analysis can be abstracted to 
a finite computation graph, which becomes easily representable as a cTRS, 

4. by establishing complexity preservation of the transformation from JBC programs 
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P to constrained TRSs 71 by a linear factor, ie., the runtime complexity with 
respect to P is linearly related to the runtime complexity with respect to 1Z, 



As a corollary to these results, we obtain a transformation from imperative programs to 
rewrite systems that is non-termination preserving, ie., any infinite evaluation of a JBC 
program P gives rise to an infinite derivation over the obtained rewrite system 1Z. We 
emphasise, that the proposed transformation is not directly automatable, but requires 
an extension by an external shape analysis (see for example 0, H, 0). Essentially 
exploiting the annotation technique initially proposed in [27fl we have implemented a 
prototype of the approach to test its viability. 



2. Related Work 

The most overlap of our work is with earlier results reported for the termination graph 
method, in particular to the work by Otto et al. |27] and Brockschmidt et al. 0| that 
essentially share the same restrictions on the type of JBC programs analysed. The 
approach has been implemented in AProVEl In contrast to these results our approach 
provides an alternative representation of abstract states that relies only on one simple 
form of annotations. In particular the tedious bookkeeping of annotations is avoided. 
Furthermore, while (2?J (and follow-up work) rely on (unspecified) heuristics to guarantee 
a finite termination graph, we precise the notion of widening employed and show that 
finiteness of the computation graph can be guaranteed, even if widening is performed 
gently. On the downside our approach is not directly automatable as the translation to 
cTRSs relies on external analysis (or annotations) for cyclic or non-tree shaped data- 
structures. 

Termination behaviour and complexity of such programs is studied by Albert et al. 
in [H, Q|. The approach employs program transformations to constrained logic programs 
and has been successfully implemented in the tool; it allows often surprisingly 

precise bounds on the resource usage and is not restricted to runtime complexity. Related 
work, targeting C programs has been reported by Alias et al. [3j. A theoretical limitation 
of the work is the focus on a path-length analysis of the heap, which does not provide 
the same detail as the term based abstraction presented here. 

Gulwani and Zuleger study the reachability-bound problem that translates into bounds 



on the runtime complexity of a program [14| . In Zuleger et al. 35] the general method- 



ology of [14j | is refined by the use of size-change abstraction. The latter approach has 
been implemented in the tool LOOPUS. In connection with pathwise analysis and con- 
textualisation size-change abstraction yields a powerful analysis. Both approaches rely 
on the use of (standard) invariant generation tools to link the bound program variable 



to the input. Furthermore, Gulwani et al. 12|, [l3j propose counter instrumentation of 



the code to a complexity analysis. Generally speaking, these methods are closely linked 
to Microsoft Research's SPEED-project and study C programs. Our approach extends 
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the use of transition systems by cTRSs, which theoretically form a strict extension. Fur- 
thermore, as our methods are rooted in rewriting we are not limited to the powers of 
invariant generation tools. 

This paper is structured as follows. In Sections [3] and [5] we fix some basic notions to 
be used in the sequel. In particular, we give an overview over the Jinja programming 
language. Our notion of abstract states is presented in Sections [5] and [6l while com- 
putation graphs are proposed in Section [7J Section [8] introduces cTRSs and presents 
the transformation from computation graphs to rewrite systems. In Section [9] we briefly 
mention crucial design choices for our prototype implementation. Finally, in Section [10] 
we conclude. 



3. Preliminaries 

Let / be a mapping from A to B, denoted f : A —> B, then dom(/) = {x \ f(x) G B} 
and rg(/) = {f(x) | x G A}. Let a G dom(/). We define: 



We usually use square brackets to denote a list. Further, (::) denotes the cons operator, 
and (@) is used to denote the concatenation of two lists. 

Definition 3.1. A directed graph G = (Vq, Succg, Lq) over the set C of labels is a 
structure such that Vg is a finite set, the nodes or vertices, Succg ■ Vq — > Vq is a mapping 
that associates a node u with an (ordered) sequence of nodes, called the successors of 
u. Note that the sequence of successors of u may be empty: Succg{u) = []. Finally 
Lq'- Vg — > £ is a mapping that associates each node u with its label Lq(u). Let u, v be 
nodes in G, such that v G Succg then there is an edge from u to v in G; the edge from 
u to v is denoted as u — > v . 

Definition 3.2. A structure G = (Vg, Succq, Lg, Eg) is called directed graph with edge 
labels if (Vg, Succg, Lq) is a directed graph over the set C and Eg ■ Eg — > £ is a mapping 
that associates each edge e with its label Eg(c). Edges in G are denoted as u — > v , where 
Eg(u — > v) and u, v G Vq. We often write u — > v if the label is either not important or 
is clear from context. 

If not mentioned otherwise, in the following a graph is a directed graph with edge labels. 
Usually nodes in a graph are denoted by u, v, . . . possibly followed by subscripts. We 
drop the reference to the graph G from Vq, Succg, an d Lg, ie., we write G = (V, SuccL) 
if no confusion can arise from this. Further, we also write u G G instead of u G V. 

Let G = (V, SuccL) be a graph and let u G G. Consider Succ(u) = [u±, . . . , u/-}- We 

call Ui (1 ^ i ^ k) the i-th successor of u (denoted as u — ^g u i)- If u v f° r some i, 
then we simply write u ^g v - A node v is called reachable from u if u — *g v, where — 
denotes the reflexive and transitive closure of We write -^g f° r ° ~^G- A graph 




f(x) otherwise . 



if x = a 
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G is acyclic if u v implies u ^ v. We write G \ u for the subgraph of G reachable 
from u. 

4. Jinja Bytecode 

In this section, we give an overview over the Jinja programming language In par- 

ticular we inspect the internal state of the Jinja Virtual Machine (JVM). We expect the 
reader to be familiar with the Java programming language. 

Definition 4.1. A value in Jinja can be a Boolean, an integer, a reference (or address), 
the null reference (null), or the dummy value (unit). 

We usually refer to (non-null) references as addresses. The dummy value unit is 
used for the evaluation of assignments (see flil |) and also used in the JVM to allocate 
uninitialised local variables. 

Example 4.1. Figure [1] depicts a program defining a List object with the append 



method. Deviating from the notation employed by Klein and Nipkow in 18] , we present 
Jinja code in a Java-like format. 



class List{ 
List next ; 
int val ; 

unit append(List ys){ 
List cur = this ; 
while ( cur . next != null){ 
cur = cur . next 

> 

cur .next = ys ; 

} 



Figure 1: The List program. 

In preparation for the sequent sections, we reflect the structure and properties of JBC 
programs and the JVM. 

Definition 4.2. A JBC program consists of a set of class declarations. Each class is 
identified by a class name and further consists of the name of its direct superclass, field 
declarations and method declarations. A field declaration is a pair of field name and field 
type. A method declaration consists of the method name, a list of parameter types, the 
result type and the method body. 

A JBC method body is a triple of (not x not x instructionlist) . The two numbers 
represent the maximum size of the operand stack and the number of local variables, not 
including the this pointer and the parameters of the method, while instructionlist gives 
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a sequence of JBC instructions. The set of Jinja bytecode instructions is adapted for our 
needs and listed in Figure [2 We employ following conventions: Let n denote a natural 
number, i an integer, v a Jinja value, cn a class name, and mn a method name. 



• Load n 



• IAdd 



• Store n 



• I Sub 



• Push v 



• IfFalse i 



• Pop 

• New cn 

• Getf ield fn cn 

• Putf ield fn cn 

• Checkcast cn 



• CmpEq 

• Goto i 

• CmpNeq 

• CmpGeq 

• BAnd 



• Invoke mn 



• BOr 



• Return 



• BNot 



Figure 2: The Jinja bytecode instruction set. 



Definition 4.3. A (JVM) state is a pair consisting of the heap and a list of frames. Let 
-< denote the strict subclass relation and ^ its reflexive closure. A heap is a mapping 
from addresses to objects, where an object is a pair (cn, f table) such that: 

• cn denotes the class name, and 

• f table denotes the fieldtable, ie., a mapping from (cn' , fn) to values, where fn is 
a field name and cn' is a (not necessarily proper) superclass of cn, ie., cn ^ cn'. 

A frame represents the environment of a method and is a quintuple (stk, loc, cn, mn,pc), 
such that: 

• stk denotes the operation stack, ie., an array of values, 

• loc denotes the registers, ie., an array of values, 

• cn denotes the class name, 

• mn denotes the method name, and 

• pc is the program counter. 

Let stk (loc) denote the operation stack (registers) of a given frame. Typically the 
structure of loc is as follows: the th register holds the f/izs-pointer, followed by the 
parameters and the local variables of the method. Uninitialised registers are preallocated 
with the dummy value unit. We denote the entries of stk (loc), by stk(i) (loc(i)) for 
i £ N and write dom(stk) (dom (loc)) for the set of indices of the array stk (loc). Often 
there is no need to separate between the local variables of a Jinja program and the 
registers in a JBC program. Hence we use registers and local variables interchangeably. 
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{heap, (12 :: ii :: stk,loc,cn,mn,pc) :: frms) 
(heap, ((12 + ii) :: sifc, Zoc, cn, mn,pc + 1) :: frms) 

Figure 3: The IAdd bytecode instruction. 

Figure [3] illustrates the one-step execution of the IAdd bytecode instruction. We have 
extended the original set of instructions by some standard operations on values, taking 
ideas from Jinja with Threads into account 0,0. The semantics of all employed Jb6 
instruction can be found in the Appendix. 

Example 4.2. Consider the List program from Example 14.11 Figure U] depicts the 
corresponding bytecode program, resulting from the compilation rules in (l8| . In the 
following we name the registers 0,1, and 2 as this, ys, and cur, respectively. 



Name: List Bytecode: 



Classbody : 


00 


Load 






Superclass: Object 


01 


Store 2 






Fields : 


02 


Push unit 






List next 


03 


Pop 






int val 


04 


Load 2 






Methods : 


05 


Getf ield 


next 


List 


Method : unit append 


06 


Push null 






Parameters : 


07 


CmpNeq 






List ys 


08 


IfFalse 7 






Methodbody : 


09 


Load 2 






MaxStack : 


10 


Getf ield 


next 


List 


2 


11 


Store 2 






MaxVar s : 


12 


Push unit 






1 


13 


Pop 








14 


Goto -10 








15 


Push unit 








16 


Pop 








17 


Load 2 








18 


Load 1 








19 


PutField 


next 


List 




20 


Push unit 








21 


Return 







Figure 4: The bytecode for the List program. 

The bytecode verifier established in [l8| ensures following properties: All bytecode 
instructions are provided with arguments of the expected type. No instruction tries to 
get a value from the empty stack, nor puts more elements on the stack or access more 
registers than specified in the method. The program counter is always within the code 
array of the method. All registers except from the register storing this must be first 
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written to before accessed. Furthermore the verifier ensures that for states with equal 
program counter the size of the stack is of equal length. Moreover, the list of registers is 



of fixed length. The compiler presented in [18l | transforms a well-formed Jinja program 



into a well-formed JBC program. A JBC program that passes the bytecode verification 
is again called well-formed. 

While the set of instruction used here are a (slight) extension of the minimalistic set 
considered in [181 ] . this notion of well-formedness is still applicable, as all considered 
extensions are present in Jinja with Threads 2^, 2lj]. In the following we consider Jinja 



programs and JBC programs to be well-formed. To ease readability we do not consider 
exception handling, that is, an exception yields immediate termination of the program. 
This is not a restriction of our analysis, as it could be easily integrated, but complicates 
matters without gaining additional insight. 

Let P be a program and let s and t be states. Then we denote by P: s t the 

one-step transition relation of the JVM. If there exists a (normal) evaluation of s to t, 

jvm 

we write P: s > t. 

We define the runtime of a JVM for a given normal evaluation P: s -^-V t as the 
number of single-step executions in the course of the evaluation from s to t. 

Definition 4.4. We define the runtime complexity with respect to P as follows: 

rcjvm(n) := max{m | P: start -^-V t holds such that the runtime is m, 
start is an initial state, and ||s£ari|| ^ n } . 

Here ||-|| denotes a suitable size measure for states; the measure is made precise below. 

5. Abstract States 

In this section, we introduce abstract states as generalisations of JVM states. The in- 
tuition being that abstract states represent sets of states in the JVM. The idea of ab- 



stracting JVM states in this way is due to Otto et al. [271 ] . However, our presentation 
crucially differs from j2?J (and also from follow-up work in the literature) as we employ 
an implicit representation of sharing that makes use of graph morphisms, rather than 
the explicit sharing information proposed in [2Tl. |j| . Furthermore, abstract states as 



defined below are a straightforward generalisation of JVM states as defined in [18l |. This 
circumvents an additional transformation step as presented in jjj. 

Definition 5.1. We extend Jinja expressions by countable many abstract variables 
X%, X2,Xs, . . . , denoted by x, y, z, . . . An abstract variable may either abstract an object, 
an integer or a Boolean value. 

In denoting abstract variables typically the name is of less importance than the type, 
that is we denote an abstract variable for an object of class cn, simply as cn, while 
abstract integer or Boolean variables are denoted as int, and bool, respectively. The 
(strict) subclass relation (-<) H is extended in the natural way to abstract variables for 
classes. 
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Definition 5.2. An abstract value is either a value (cf. Definition 14. ip . or an abstract 
Boolean or integer value. As in the JVM, only (abstract) objects can be shared. In 
particular note that abstract variables for objects are only referenced via the heap. 

The next definition abstracts the heap of a JVM through the use of abstract variables 
and values. 

Definition 5.3. An abstract heap is a mapping from addresses to abstract objects, where 
an abstract object is either a pair (cn, f table) or an abstract variable. We define (partial) 
projection functions cl and ft as follows: 

cn if obj is an object and obj = (cn, f table) 
cn if obj is an abstract variable of type cn 

f table if obj is an object and obj = (cn, f table) 
undefined otherwise 

Abstract frames are defined like frames of the JVM, but registers and operand stack of 
an abstract frame store abstract values. Furthermore, we define annotations of addresses 
in a state s, denoted as in. Formally, the annotations are pairs p ^ q of addresses, where 
p,q € heap and p ^ q. The intuition of in is to express that for p ^ q G iu, we disallow 
sharing of these addresses in states represented by the state s. 

Definition 5.4. An abstract state s = (heap, frms,iu) is a triple consisting of an ab- 
stract heap heap, a list of abstract frames frms, and a set of annotations iu. Further- 
more, we demand that all addresses in heap are reachable from local variables or stack 
entries in the list of frames frms. 

When depicting states, we replace stack and register indices by intuitive names, de- 
noted in roman font. Furthermore, we make use of the following conventions: we use an 
italic font (and lower-case) to describe abstract variables and a sans serif (and upper-case) 
to depict class names. 

Example 5.1 (continued from Example 14. ip . Consider the List program from Exam- 
ple 14.11 together with the well-formed JBC program depicted in Figure UJ Consider the 
state A depicted below: 



04 


e 


this = oi , ys = 02 , cur = o\ 




Ol 


— List (List .val = int, List. next = 03) 


A 
\ 


02 


= list, 03 = list 



The operation stack in A is empty. The registers this and cur contain the same address 
01 and ys is mapped to 02. In the heap o\ is mapped to an object of type List whose 
value is abstracted to int and whose next element is referenced by 03. It is not difficult 
to see that A forms an abstraction of any JVM state obtained at instruction 04 in the 
List program (if this initially references a non-empty list) before any iteration of the 
while-loop. Furthermore, consider the following state B: 



c\(obj) : = 
h(obj) := 
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04 




this = oi , ys = 02 , cur = 03 




Ol 


= List (List .val = int, List. next = 03) 




02 


= list, 04 = list 


B 


03 


— List (List .val = int, List. next = 04) 



Again it is not difficult to see that B abstracts any JVM state obtained if exactly one 
iteration of the loop has been performed. 

In the presence of abstract variables an abstract state represents a set of JVM states. 
As it will always be clear from the context, whether we refer to an abstract or a JVM 
state, we drop the qualifier "abstract" in the following. 

Definition 5.5. We define a preorder on abstract (non-address) values and objects; 
v ^ w holds if v = w or 

• v = unit and w = null or w is an abstract variable of type int, bool, or cn, 

• v = null and w is cn, 

• v G Z or v an abstract integer and w an abstract integer, 

• v £ {true, false} or v an abstract Boolean and w an abstract Boolean, or 

• c\(v) = cn' and w is an abstract variable of type cn such that cn' ■< cn. 

We write w ^ v, if v ^ w. 

The presence of abstract variables in states allows to abstract away certain details of a 
given state t. This intuition is made precise in the next definition. Let \stk\, \loc\ denote 
the maximum size of the operand stack and the number of variables respectively. We 
make use of the following abbreviation: w v if either w v or v , w are references 
and we have v = m(w), where m denotes a mapping on references. 

Definition 5.6. Let s = {heap, frms, iu) be a state with frms = [frnii, . . . , frm^] and 
frrrii = (stki,loci,cni,mni,pci). Furthermore let t = (heap' , frms' ,iu') be a state with 
frms' = [frm^, . . . , frm' k ] and frm\ = {stk\, loc^, cn[, mn'^pc^). 

Then s is an abstraction of t (denoted as s □ t) if the following conditions hold: 

1. for all 1 ^ i ^ k: pci = pc[, cni = cn\, and mni = mn\, 

2. for all 1 ^ i ^ k: dom(stki) = dom(stk' i ) and dom(Zocj) = dom(Zoc^), 

3. there exists a mapping m: dom(heap) —¥ 6om(heap') such that 

• for all 1 ^ i ^ k, 1 ^ j ^ |stfcj|: stki(j) ^ m stk'^j)), 

• for all 1 ^ i ^ k, 1 ^ j ^ \loci\: loci(j) loc'^j)), 

• for all a G dom(heap): heap(a) ^ m heap'(m(a)), 
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• for all a € dom(heap), such that ft(heap(a)) is defined and for all 1 ^ % I: 

f(cni,idi) /'(cn'^idi), 

where / := ft(heap(a)) with dom(/) = {(cni,id\), . . . , (cri£,idi)} and 
/' := ft(heap' (m(a))) with dom(/') = {(cni,idi), . . . , (cni,idi)}. 

4. finally, we have iv! 3 m*(iu). 

Example 5.2 (continued from Example 15. ip . Consider the states A and B described in 
Example 15.11 For the state S depicted below we obtain that A C S and B C S, ie., S 
forms an abstraction of both states. 



04 


e 


this = oi , ys = 02 , cur = 04 




Ol 


= List (List .val = int, List. next = 03) 




02 


= list, 03 = list, 05 = list 


S 


04 


— List(List.val = int, List. next = 05) 



Strictly speaking Definition 15.61 does not apply to JVM states as clearly the latter do 
not contain annotations. However, to simplify the notation, we identify JVM states with 
abstract states where all references are marked as different in the annotation iu. Thus 
the relation C becomes applicable to relate program states and their abstractions. 

While the above form of representing states allows for a succinct presentation, it is 
more natural to conceive the heap (and conclusively a state) as a graph. In the next 
section, we make this intuition precise. 

6. Graph-Based Representation of States 

Let s be a state and let heap denote the heap of s. We propose a graph-based rep- 
resentations of heap and state s called heap graph and state graph respectively. This 
representation makes use of a set Iheap of implicit reference. Suppose a is an address on 
heap, cn a classname, and id a field identifier. Furthermore, suppose f table = ft(heap(a)) 
is defined and ftable((cn, id)) = val, such that val not an address. Then we say the triple 
(a,cn,id) is an implicit reference for val; the set Iheap collects all implicit references of 
heap. 

Definition 6.1. Let heap denote the heap. We represent heap as a directed graph with 
edge labels H = (Vjj , Succh , Lh, Eh), where the nodes, the successor relations and the 
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labelling functions are denned as follows: 



dom(heap) U dom(Ih eap ) 

[f*(u, (cn\,idi)), ...,f*(u, (cnk,idk))] if u is an address, 

h(heap(u)) = f, dom(/) = 
{(cni,idi), . . . , (cn k ,id k )} 
_ [] otherwise . 

J c\(heap(u)) if u is an address 
I fa/ if u is an implicit reference for val . 

J (cn,id) if u is an address, ft(heap(u)) =: / and f*(u, (cn,id)) = v 
I otherwise . 

Here f*(u, (cn,id)) := f((cn,id)), if f((cn,id)) is an address and f*(u, (cn,id)) := 
(u,cn,id) otherwise, where (u,cn,id) £ Iheap- 

We usually confuse the heap with its representation as graph. In particular we call a 
value val reachable from an address a in heap, if there exists a path from a to val in 
the heap graph of heap. Based on the graph representation of heap, we represent s as a 
state graph S. 

Let s = (heap, frms, iu) be a state and let frms = [frmi,...,frmk], such that 
frrrii = (stki,loCi,crii,mni,pCi). We define the set Stk(s) := {(stk,i,j) | 1 ^ i ^ k, 1 ^ 
j ^ |stfcj|} that collects all stacA; indices. Similar we define the set of register indices: 
Loc(s) := {(loc,i,j) \ 1 ^ i ^ k,l ^ j ^ \loCi\}. If s is clear from context, we write 
Stk (Loc) instead of Stk(s) (Loc(s)). We extend the set Iheap to cover also non-address 
values stored in the stack or registers. For this it suffices to extend Iheap by a disjoint 
copy of Stk(s) U Loc(s). The set of implicit references with respect to s is denoted as X s . 
The copy of a stack or register index in X s is called its implicit reference. 

Definition 6.2. Let s = (heap, frms, iu) be a state and let H denote the heap graph 
of heap. Furthermore, let (stk,i,j) G Stk and let (loc,i',f) € Loc. We write osufi and 
t° name the indices of the operation stack and registers. 
We define the state graph of s as 5-triple S = (Vs, Succs, L$, E$,iu), where the first 
four components denote a directed graph with edge labels and iu denotes a set of an- 
notations. The nodes, the successor relation, and the labelling function of the directed 



V H := 
Succh(u) := 

L H (u) := 
Eh(u — > v) := 
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this 
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Figure 5: Abstract State A 
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Figure 6: Abstract State B 



graph are defined as follows: 

V s := Stk U Loc U V H Ul 8 



Succs(u) := 



L S (u) 



E s (u 



'[ st k*ti)] [iu = (stk,i,j) G 

Zoc*(j)] if u = (loc,i,j) G -Loc 
^Succh(u) if n G Vff • 

if n = (stk,i,j) G S'tfc 
if n = (loc,i,j) G Loc 
if it G Vff 

if it is an implicit reference for val . 

v) if u, v G H 
otherwise . 




Here stk*(j) and loc*(j) is defined like /* as introduced in Definition 16.11 ie., stk* (J) := 
s tki(j) (loc*(J) := loci(j)), if stki(k) (loci(j)) is an address and stk^(j) is defined as the 
implicit reference of (stk,i,j); similarly for (loc*{j). 

We often confuse a state s and its representation as a state graph S. In presenting 
state graphs, we indicate references, but do not depict implicit references. The graph- 
based representation S provides a much better intuition about the notion of instance 
of abstraction of s, cf. Definition 15.61 In particular it will turn out that Definition 15.61 
amounts to a variant of graph morphism on two different graph representation of states. 

Example 6.1 (continued from Example 15. 2p . Consider the states A, B, and S presented 
in Example 15.21 The state graph of A and B are given in Figure [5] and Figure [6j 
respectively. The state graph of the abstraction S is depicted in Figure [71 



The size of a state is defined on a per-reference basis, which unravels sharing. 
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Definition 6.3. Let s be a state and let S be its state graph. Let u,v be a, nodes in 
S. We we write u — )■ v, if there exists a simple path P in S from u to v. Note that P 
must not contain cycles. Then the size of a stack or register index u (denoted as ||w|| is 
defined as follows: 

||u|| := W L sQ)\\ • 

u — y v 

where ||2|| is abs(i) if I £ Z, otherwise 1. (As usual abs(z) denotes the absolute value of 
the integer z.) Then the size of s is the sum of all sizes of stack or register indices in S. 

Based on the relation cf. Definition 15.51 we introduce the following variant of graph 
morphism, called state homomorphism. 

Definition 6.4. Let S and T be state graphs of states s and t, respectively. A state 
homomorphism from S to T (denoted m : S — > T) is a function m: Vs — > Vt such that 

1. Stk(s) = Stk(t) =: Stk and Loc(s) = Loc(t) =: Loc, 

2. for all u G S and u € Stk U Loc, u = m(u) and L$(u) = Ly(m(M)), 

3. for all u € S\ (Stk U Loc), L s {u) & L T (m(u)), 

4. for all n G 5 such that Succs{u) ^ 0, m*(Succs(u)) = Succr(jn(u)), and 

5. for all u — > v € S and m(u) — > m(v) £ T, £ = £'. 

We use m* to denote the lifting of m to non-empty lists or sets: m([ux, . . . , U]-\) = 
[m(ui), . . . m(u k )\. 

If no confusion can arise we refer to a state homomorphism simply as morphism. It is 
easy to see that the composition m\ omi of two morphisms m\, ni2 is again a morphism. 

Lemma 6.1. Let s = (heap, frms, iu) and t = (heap' , frms', iu) be a states, whose state 
graphs are denoted by S and T respectively. Suppose the program counters, the class and 
method names of all frames coincide. Furthermore suppose thatiu' 5 m*(iu) and assume 
there exists a morphism m: S — > T; then s □ t. 

Proof. Straightforward. □ 

It is an easy consequence of Lemma 16.11 and the composability of morphism that the 
instance relation C is transitive. Hence the relation C is a preorder. In the following we 
aim to provide a mechanism to widen two distinct abstract states. 

Definition 6.5. Let s and s' be states, such that there exists an abstraction t of s and 
s' . We call t the join of s and s' , denoted as s U s' , if t is a least upper bound of {s, s'} 
with respect to the preorder C. 
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In order to prove the existence of s U s', we identify invariants of abstractions. Let S = 
(Ys, Succs, Ls, Es,ius) and S' = (Vs', Succs' , Ls> , Eg> ,ius>) be the two state graphs of 
state s and s', respectively. Furthermore let t be an abstraction of s and s' and T = 
(Vt, Succt, Lt, Et,iut) its state graph. By definition we have the following properties: 

1. Let Stk (Loc) collect the stack (register) indices of state s. As s C t, Stk (Loc) 
coincide with the set of stack (register) indices of t. Similarly for s' and thus 
V T 5 Stk U Loc. 

2. For any node u € T there exists uniquely defined nodes v £ Vs, w G Vg> such that 
Ls(v) *3 Lt(u), Ls'(w) ^ Lt(u). We say the nodes v and w correspond to u. 

3. For any node u 6 T and any successor u' of u in T there exists a successor v' 
(w') in S (S') of the corresponding node v (w) in S (S'). Furthermore v' and w' 
correspond to v! . 

4. For any edge n — > v! G T such that v (w) corresponds to u in S (S') there is an 
edge v — > v' S S and an edge w — > w' S S 1 such that £ = k = k' . 

5. For any annotation u ^ u' € i^T there exists t> 7^ w' in ius and u; ^ w' in iu^/ such 
that v (v') and if (if') correspond to u (u r ). 

In order to construct an abstraction t of s and s' we use the above properties as 
invariants and define its state graph T by iterated extension. We define T° by setting 
Vpo := Stk U Loc. Due to Property [1] these nodes exist in S' as well. The labels of 
stack or register indices trivially coincide in S and S', cf. Definition 16.41 Thus we set 
L T o accordingly. Furthermore we set Smcc^o = E T o = iu j>o '■= 0- Then T° satisfies 
Properties HHSJ 

Suppose state graph T n has already been defined such that the Properties [IH5] are 
fulfilled. In order to update T n , let u € Vr n such that v and w correspond to u. Suppose 
v — > v ' € S and w — > w' € S' such that there is no node v! in T n where v' and w' 
correspond to v! . Let v! denote a node fresh to T n . We define V^n+i := Vp n U {n'} and 
establish Property [2] by setting L Tn +i(v!) such that Ls{v') Q L Tn +i(u') and Lgiw' C 
Lrp n +i(u') where L^n+i(ii') is as concrete as possible. If we succeed, we fix that 1/ and u>' 
correspond to u'. It remains to update iujm+i suitably such that Property [5] is fulfilled. If 
this also succeeds Properties HHSl are fulfilled for T n+1 . On the other hand, if no further 
update is possible we set T := T n . By construction T is an abstraction of S and S' and 
indeed represents sUs', 

Definition 6.6. As U is associative and commutative, we can extend the binary operation 
U to define the least upper bound of a set of states T, denoted as |J T ■ We call the 
abstraction of the states in T by |J T widening. 

Example 6.2 (continued from Example 15. 2\ . Consider the states A, B, and S described 
in Example [5]2l In Figure [7] an abstraction of A and B is given. In particular, abstraction 
S results of the construction defined above, ie., S = |J {A, B}. 
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Figure 7: Abstraction S 



Let P be a fixed JBC program. In the next section we propose computation graphs as 
finite representations of all possible JVM states of P. The nodes of computation graphs 
are abstract states. 

7. Computation Graphs 

In this section, we define computation graphs as finite representations of all possible 
representation of the program flow of a given JBC program. Computation graphs are 
strongly related to termination graphs as proposed by Otto et al. |27l | and in particular 
Brockschmidt et al. jjj. 

Above, we have already restricted our attention to well-formed JBC programs P using 
the expressions and instructions defined in Section Q] For the proposed static analysis 
of these programs we make the following additional restrictions. First, we restrict to 
non-recursive methods. Note that the abstract states defined above (cf. Definition I5.4[) 
can in principle express recursive methods, but for recursive methods, we cannot use 
the below proposed construction to obtain finite computation graphs, as the graphs 
defined in Definition 17.41 cannot handle unbounded list of frames. Second, in the final 
transformation to cTRSs, we abstract non tree-shaped objects, ie., for example whenever 
P creates a cyclic object, we represent it by an abstract variable of the corresponding 
class. Note that our notion of abstract states, and thus also computation graphs can 
express non-tree shaped, and even cyclic objects, but we can only represent tree-shaped 
objects as terms. Let P denote a well-formed JBC program based on data in tree-shaped 
form that only makes use of non- recursive methods. P is fixed for the remainder of this 
paper. 

Suppose two addresses p on q in a state s could potentially be shared in an instance 
of s. Then we call p and q unifiable. 

Definition 7.1. Let s = {heap, frms, iu) be a state and let p, q denote distinct addresses 

in heap such that p 7^ q £ iu. Then we say p and q are unifiable (denoted as p = q) 
if there is a JVM state t reachable in P from some initial state start and a morphism 
m: s — > t, such that m(p) = m(q). 

Figure [3] presents the single-step execution of the IAdd instruction (see fl~j| for the 
rest). Based on these instructions, and actually mimicking them quite closely, we define 
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how abstract states are symbolically evaluated. In Figure [8] we have worked out the 
cases for the instructions New, Putf ield, If False, IAdd, CmpEq, and BAnd. We follow 
the notation used in Figure [3] above. The other cases are left to the reader. 

Some comments: The symbolic instruction New cn' creates a new entry in the heap 
such that a is mapped to a new abstract variable of type cn' . Formally, let a be a new 
address and x a new abstract variable of type cn' . We define the mapping heap' such 
that Aom(heap') = 6om(heap) U {a}. Let v be a value and a be an address such that 
heap(a) = (cn" , f table). The application of the instruction Putf ield fn cn' is only 

possible if there exists no address p € heap such that a = p. For the IAdd instruction, we 
introduce a new abstract integer i% and the side-condition ii+12 = £3- Finally, the CmpEq 
splits into different cases, depending on the status of the compared values. Note that 
CmpEq does only perform an equality check on Jinja values and due to the abstraction on 
the heap we have to be careful when comparing addresses: 

1. Let vali and v ali be addresses. If the addresses of vali and vali are the same then 
the test evaluates to true. Otherwise, we have to check if v al\ and v 0X2 unify and 
perform a unsharing refinement according to Definition 17.31 if necessary. 

2. Wlog. let vali be an address and v al2 be null. If heap(vali) = obj and c\(obj) = cn, 
we perform a instance refinement according to Definition 17.21 on vali. 

3. If val\ and val2 are concrete non-address Jinja values, then the test (vali = val2) 
can be directly executed and the symbolic execution equals the instruction on the 



4. If vali and val2 are abstract Boolean or integer variables, then we introduce a new 
Boolean variable 63 and the side condition (vali = = &3- Figure [8] only shows 

the latter case. 

In addition to symbolic evaluations, we define refinement steps on abstract states 
s if the information given in s is not concrete enough to execute a given instruction. 
Following 0| we make use of class instance and unsharing. Note, that it will be a 
consequence of our definitions that for any refinement s of a state t, we have s Qt. 

Definition 7.2. Let s = (heap, frms,iu) be a state and let a be an address such 
that cl (heap(a)) = cname. Suppose subclasses := {cn \ cn ^ cname} and let cn € 
subclasses. Furthermore, suppose (cni,idi), . . . , (cn n ,id n ) denote fields of cn (together 
with the defining classes). 

We obtain the following two refinement steps, where the second takes care of the case, 
where abstract variable at address a is replaced by the null pointer. 



Here ftablei((cn{,idi)) := Vi such that the type of the abstract variable Vi is defined in 
correspondence to the definition of cnj. On the other hand we set heap2 (frms2) equal 
to heap (frms), but a ^ &om(heap2) and all occurrences of a are replaced by null. 



JVM. 
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New cn' 

Putf ield fn cn' 
I Add 

IfFalse i 



CmpEq 



BAnd 



(heap, (stk,loc,cn,mn,pc) :: frms,iu) 
(heap' {a M> x}, (a :: stk, loc, cn, mn, pc + 1) :: frms, iu) 

(heap,(v :: a :: stk,loc,cn,mn,pc) :: frms,iu) 
(heap {a n> (cn" , f table')}, (stk, loc, cn,mn,pc + 1) :: frrns,iu) 

(heap,(%2 :: ii :: stk,loc,cn,mn,pc) :: frms,iu) 
(heap, (13 :: stfc, ioc, cn, mn,pc + 1) :: frms, iu) 
(heap, (false :: stk,loc,cn,mn,pc) :: frms,iu) 

(heap, (stk,loc,cn,mn,pc + i) :: frms,iu) 
(heap, (true :: stk,loc,cn,mn,pc) :: frms,iu) 
(heap, (stk, loc, cn, mn, pc + 1) :: frms, iu) 

(heap, (vah :: waii :: stk,loc,cn,mn,pc) :: frms,iu) 
(heap, (63 :: stk, loc, cn,mn,pc + 1) :: frms,iu) 

(heap,(b2 :: bi :: stk,loc,cn,mn,pc) :: frms,iu) 
(heap, (63 :: s£fc, Zoc, cn, mn,pc + 1) :: frms, iu) 



ii + 12 — i-i 



(vah = vah) = bi 
62 A 61 = 63 



Figure 8: Symbolic evaluations of Jinja bytecode instructions 



class A{ 

unit m(){unit} 

} 

class B extends A{ 

unit m () {while ( true ) } 

} 



class C{ 

unit call(A a){a.m()} 
main ( ) { 

C c = new C(); 

c . call (new B ( ) ) ; 

} 



Figure 9: All subclasses need to be considered. 
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Example 7.1. In Figure [9] we present an example detailing the need for the given 
definition of class instantiation. Here class B overrides method m inherited from class 
A. We only know the static type of the parameter when analysing method call (A a). 
Method call (A a) accepts any instances of class A or any instances of a subclass of A as 
parameter. In particular any instance of class B. Due to the overridden method call (A 
a) does not terminate for instances of class B. 

Definition 7.3. Let s = (heap, frms,iu) be state, let S = (V, Succ, L, E , iu) denote its 

state graph, and let p and q denote addresses in heap such that p = q, that is, p and 
q potentially represent the same object. We obtain the following refinement steps: The 
first case forces these addresses to be distinct. The second case substitutes all occurrences 
of q with p. 

(heap, frms, iu) (heap, frms, iu) 

(heap, frms, iu U {p ^ q}) (heap' , frms', iu) , 

where heap' (frms') is equal to heap (frms) with all occurrences of q replaced by p. 

Let s and t be abstract states such that s' is obtained from s due to a symbolic evalua- 
tion (cf. Figure [HJ) a case distinction (Definition \7.2\i or an unsharing step (Definition [72J). 
Then we say t is obtained from s by an abstract computation. 

Definition 7.4. A computation graph G = (Vg,Eq) is a directed graph with edge labels, 
where Vq are abstract states and s t £ Eq if either t is obtained from s by an abstract 
computation or s is an instance of t. Furthermore, if there exists a constraint C in 
the symbolic evaluation, then I := C. For all other cases I := 0. We say that G is 
the computation graph of program P if for all initial states start of P there exists an 
abstract state I £ G such that start C I. 

Example 7.2 (continued from Example 14. ip . Consider the List program from Exam- 
ple 14.11 and the corresponding bytecode from Example 14.21 Figure [TU] illustrates the 
computation graph of append. For the sake of readability we omit the val field of the 
list. Note that this graph is not complete, ie., we omit some intermediate states and do 
not illustrate all refinement cases. 

First, consider the initial node I. It is easy to see that I is an abstraction of all concrete 
initial states. Nodes A, B and S correspond to the situation described in Example 15.11 
and Example 15.21 That is, node A is obtained after assigning cur to this before any 
iteration of the loop, node B is obtained after exactly one iteration of the loop and node 
S = |J {A,B}. We usually do not consider intermediate iterations. That is why node B 
is illustrated by a dashed border. 

After pushing the reference of cur.next and null onto the operand stack, we reach 
node C. At pc = 7 we want to compare the reference of cur.next with null. But, 
cur.next is not concrete. Therefore, a class instance refinement is performed, yielding 
nodes C' and C". 

First, we consider that cur.next is not null, but references an arbitrary instance. This 
is illustrated in node C' . The step from C' to D is trivial. Let id denote the identity 
function and m = id(Vs). Then m{o^ t— > 05,05 t-> oq} is a morphism from S to D. 
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Therefore, D is an instance of S. Second, we consider the case when cur. next is null, 
which is depicted in node C" . Node E is obtained from C" after loading registers cur 
and ys onto the stack. At pc = 14 a Putfield instruction is performed. Therefore we 
perform a refinement according to Definition 17.31 We just illustrate two different cases, 
E' and E" respectively. Nodes F' and F" are obtained after performing the Putfield 
instruction. 

The above definition of computation graph is non-constructive. First, the definition 
of unifier demands to check all reachable instances of the given state. This is clearly 
non-computable. However, we can always approximate the unifier by using standard 
unification arguments, which is precisely what we do in our implementation. Alterna- 
tively, we could employ annotations as in 

0,110,1. Second, we have not yet clarified 



how widening is performed in general. We can make this step constructive by the use 
of the join operation defined in Definition 16.61 Whenever we are about to finish a loop, 
we attempt to use an instance refinement to the state starting this loop. If this fails, for 
example in an attempted step from B to A in Example 17.21 we widen the corresponding 
state. Here we collect all states that need to be abstracted and join them to obtain an 
abstraction. Complementing the proposed widening strategy, we restrict the applica- 
tions of class instance and unsharing suitably, such that these refinement steps are only 
performed if no other steps are applicable. 

The next lemma shows that if this strategy is followed we are guaranteed to obtain a 
finite computation graph. 

Lemma 7.1. Let G be the computation graph of a program P such that in the construction 
of G the above widening construction is applied whenever possible. Then G is finite. 

Proof. We argue indirectly. Suppose the computation graph G of P is infinite. This 
is only possible if there exists an initial state start of P that is non-terminating. This 
implies that starting from start we reach a loop in P that is called infinitely often. As 
G is finite this implies that the widening operation for this loop gives rise to an infinite 
sequence of states (sj)i^o such that Sj C Sj+i for all i. However, this is impossible as any 
ascending chain of abstract states is finite, as for all z: ||sj|| > □ 

Let G be a computation graph. We write s —^q t to indicate that state t is directly 
reachable in G from s. Sometimes we want to distinguish whether t is obtained by an 
abstract computation (denoted as s -^ cmp t) or by a widening step (denoted as s -^ w id t). 

If t is reachable from s in G we write s ^g t- K s ^ t this is denoted as s ^g t- If a t 
most one step was performed we write s t- 

Lemma 7.2. Let s be a state and let s 1 be a JVM state such that s 1 C s. Then P: s' ^—^i 
t' implies the existence of a state t such that if C t and s ^ C mp • ^cmp • —\vid t- 

Proof. In proof, we proceed by case distinction on the instruction executed to perform 
P: s' t'. This will show the existence of state t such that t' C t where s -^* mp 

• ^cmp t holds. More precisely at most two refinement steps may be necessary before 
the JVM instruction is symbolically evaluated in G. In order to reach from t a widened 
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Figure 10: The (incomplete) computation graph of List. append. 

state in G an instance edge may be necessary. This explains the optional widening step 
proposed in the lemma. 
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By assumption s' C s, let s' = (heap' , frm' :: frms',iu') and s = (heap, frm :: 
frms, iu). We only treat some informative cases and employ the notation from Figure [HJ 

• Consider Load n. By definition of C the number of registers in frm' and frm 
coincide. In particular loc(n) is defined and its value can be symbolically loaded 
to obtain an abstract state t such that t' C t holds. 

• Consider Store n. By definition of C the number of stack elements in frm' and 
frm coincide. Furthermore the number of register coincides. Hence the instruction 
is executed symbolically to obtain a state t such that t' Ct holds. 

• Consider Push v. The instruction can be directly executed, as v is a concrete value. 

• Consider Pop. Similarly to the Push instruction, but the value v may now be an 
abstract value. 

• Consider New en! . Instead of creating a new object of class en! a new abstract 
variable of type en' is created. 

• Consider Getfield fn en'. By definition of C the number of stack elements in 
frm' and frm coincide. Furthermore a' is an address in state s'. Hence the 
corresponding element a on the stack in state s is an address, too. However, the 
value heap(a) may be abstract and need not contain the field fn. In order to create 
this field, a class instance refinements needs to be invoked. After this refinement, 
the instruction can be executed. 

• Consider Putfield fn en'. Analogous to the case of Getfield, but the crucial 
difference to the Getfield instruction is that Putfield can only be symbolically 
executed, if fn is only reachable via the address a. This guarantees t' C t but may 
require additional unsharing refinement steps. 

• Consider Checkcast en! . By assumption on P the cast check is void. Hence the 
(abstract) state remains unchanged. 

• Consider Invoke mn! n. By definition of C the number of stack elements in frm! 
and frm coincide. Hence method mn! can be directly invoked in the abstract state 
s. Let a, and po,. . . ,p n -i denote the address of the calling object and pq,. . . ,p n -i 
the parameters in s. Then the top- frame in the resulting state t has the following 
form: 

frm" = ([], [a,p , . . . ,p n -i] @ units , en" , mn' , 0) , 

where en" is suitable defined. As the address a and the pq,. . . ,p n _i are generalisa- 
tions of the corresponding address and parameters in s' we obtain t' C t. 

• Consider Return. Without loss of generality, we assume that the frame stack 
contains at least two elements. Let frm^ and frm' 2 denote the second frame in s' 
and the resulting transformation of this frame in t', respectively. Furthermore let 
frm\ denote the generalisation of frm\ in the abstract state s. By definition of C 
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the number of stack elements in frm! x and frm\ coincide. Hence the execution can 
be performed symbolically so that the frame frm.2 is obtained as a transformation 
of frm\. We obtain frm'2 E frrri2. 

• Consider IAdd. Let i2,ii denote the first two stack elements of s. By definition of 
the symbolic execution of IAdd we perform the step by introducing a new abstract 
integer 13 and adding the constraint is = ii+ %2- The thus obtained state t is more 
general than the concrete state t' as for any number z and any abstract integer i% 
we have z C 13. 

• Consider If False i. Without loss of generality let false denote the first element on 
the stack of s. Executing the symbolic step yields a state t, which is a generalisation 
of t' by assumption on s' and s. 

• Consider CmpEq. The case is similar to the symbolic step IAdd. 

• Consider BAnd. The case is similar to the symbolic step IAdd. 

• Consider Goto i. As the state is essentially unchanged for the Goto instruction, the 
lemma follows immediately. 

□ 

We arrive at the main result of this section. 

Theorem 7.1. Let s' and t' be JVM states, where s' is reachable from some initial state 
start. Suppose P: s' t' , where the runtime of the execution is m. Let G denote the 
computation graph of P. Then there exists an abstraction s of s' and an abstraction t of 
t' such that s —^g t holds. Moreover let w! denote the length of the path from s to t in 
G. Then m ^ ml ^ 4m. 

Proof. By assumption we have P: start s' for some initial state start of P. By 
definition of G there exists an abstract state JeG such that start C /. By induction 
on the number of evaluation steps from start to s' in conjunction with Lemma 17.21 we 
conclude the existence of a state s £ G such that s' C s. 

By induction on m (again employing Lemma [7.2|) . we conclude the existence of states 
s and t such that s — *g t- Hence, the first part of the theorem follows. Furthermore by 
Lemma [7.21 we directly obtain that m ^ m' . However, by the proof of Lemma [7.21 we see 
that each evaluation of the JVM is simulated by a path of length 4 in G. □ 

The above theorem cannot be directly employed to show that the transformation to 
computation graphs is non-termination preserving, as we have reasoned about a given 
(finite) computation in P. However, the latter follows easily by an indirect argument. 

Corollary 7.1. The transformation to computation graphs is non-termination preserv- 
ing. 
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Proof. Suppose there exists an infinite run in P, but any path in G is finite. Let start 
be some initial state start of P. By Theorem 17.11 there exists a state t' such that 
P: start - — > t' and a node t G G with t' C i. Furthermore, as all paths in G are finite, 
we can assume t is the last node of such a path. However, as t' is non-terminating, there 
exists a successor, but by assumption there is no successor of t in G. This contradicts 
Lemma 17.21 □ 



8. Constrained Rewrite Systems 

Let G be the computation graph for program P with initial state I; G is kept fixed for 
the remainder of the section. In the following we describe the translation from G into 
a constrained term rewrite system {cTRS for short). Our definition is a variation of 
cTRSs as for example defined by Falke and Kapur (a, E3| or Sakata et al. (30| . The here 
proposed transformation is inspired by (2t1 |. Otto et al. transform termination graphs 



into integer term rewrite systems (ITRSs for short) [111 ]. In contrast to this translation, 
our representation of states is technically simpler, as our abstract states allow a simple 
graph-based representation (cf. Definition 16. 2p . 

Let C be a (not necessarily finite) sorted signature, let V' denote a countably infinite set 
of sorted variables. Furthermore let T denote a theory over C. Quantifier- free formulas 
over C are called constraints. Suppose J 7 is a sorted signature that extends C and let 
V 5 V denote an extension of the variables in V'. Let T(J~, V) denote the set of 
(sorted) terms over the signature T and V. Note that the sorted signature is necessary 
to distinguish between theory variables that are to be interpreted over the theory T 
and term variables whose interpretation is free. A constrained rewrite rule, denoted as 
/ — > r [C], is a triple consisting of terms I and r, together with a constraint C. We assert 
that I V, but do not require that Var(Z) D Var(r) U Var(C), where Var(i) (Var(C)) 
denotes the variables occurring in the term t (constraint C). A constrained term rewrite 
system (cTRS)is a finite set of constrained rewrite rules. The proposed notion of cTRSs 
is inspired by 0| and in turn influenced (la ]. 

Let 7Z denote a cTRS. A context D is a term with exactly one occurrence of a /io/e 
□ , and D[t] denotes the term obtained by replacing the hole □ in D by the term t. A 
substitution o is a function that maps variables to terms, and to denotes the homomor- 
phic extension of this function to terms. We define the rewrite relation — ^ as follows. 
For terms s and t, s — ^ t holds, if there exists a context D, a substitution o and a 
constrained rule I — > r [C]g 7Z such that s = D[la] and t = D[ro~] with T h Co. For 
extra variables x possible occurring in t we demand that (i) o(x) is in normal- form and 
(ii) that |cr(x)| is bounded by \lo\ + |rV|, where r' is obtained from r by replacing all 
extra variables with the constant □. Here \t\ denotes a suitable measure of the term 
complexity of t. We fix a specific measure below. Note that condition (i) is essential to 
ensure termination, while condition (ii) is essential to guarantee that the rewrite relation 
is finitely-branching. 

We often drop the reference to the cTRS 1Z, if no confusion can arise from this. A 
function symbol in T is called defined if / occurs as the root symbol of I, where I — > r JCJs 
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TZ. Function symbols in T \ C that are not defined, are called constructor symbols, and 
the symbols in C are called theory symbols. 

A cTRS TZ is called terminating, if the relation — >-r is well-founded. For a terminating 
cTRS TZ, we define its runtime complexity, denoted as rctrs. We adapt the runtime 
complexity with respect to a standard TRS suitable for cTRS TZ. (See for the 
standard definition.) The derivation height of a term t (with respect to TZ) is defined 
as the maximal length of a derivation (with respect to TZ) starting in t. The derivation 
height of t is denoted as dh(i). 

Definition 8.1. We define the runtime complexity (with respect to TZ) as follows: 

rctrs(n) := max{dh(t) | t is basic and \t\ ^ n} , 

where a term t = f{t\, . . . ,tk) is called basic if / is defined, and the terms t{ are only 
built over constructor, theory symbols, and variables. 

In the following we are only interested in cTRS over a specific theory T, namely 
Presburger arithmetic, that is, we have T h C, if all ground instances of the constraint 
C are valid in Presburger arithmetic. Recall, that Presburger arithmetic is decidable. If 
T h C , then C is valid. On the other hand, if there exists a substitution a, such that 
T h Ca, then C is satisfiable. 

To represent the basic operations in the Jinja bytecode instruction set (cf. Figure [3]) 
we collect the following connectives and truth constants in C: A, V, -i, true, and false, 
together with the following relations and operations: =, 7^, ^, +, — . Furthermore, we 
add infinitely many constants to represent integers. We often write I — > r instead of 
/ — > r [true]. As expected C makes use of two sorts: bool and int. We suppose that 
all abstract variables X\,X2, ■ ■ ■ are present in the set of variables V, where abstract 
integer (Boolean) variables are assigned sort int (bool) and all other variables are assigned 
sort univ. The remaining elements of the signature T will be defined in the course of 
this section. As the signature of these function symbols is easily read off from the 
translation given below, in the following the sort information is left implicit, to simplify 
the presentation. 

The size of a term t, denoted as \t\ is defined as follows: 

1 if t is a variable 

abs(t) if t is an integer 

1 + X^Lil^il if i = f(t\, ■ ■ ■ ,t n ) and / is not an integer . 

The next definition embodies the fact that only tree-shaped objects can be represented 
as terms. 

Definition 8.2. Let s = (heap, frms,iu) be a state reachable from the initial state I 
in G and let a € dom(heap). Suppose there is a JVM state t = (heap' , frms') reachable 
in P from some initial state start and a morphism m: s — > t. We call a special, if one of 
the following conditions is fulfilled: 
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1. either m(a) — m(a), that is m(a) is cyclic in t, or 

2. suppose heap'(m(a)) = (cn, f table) such that ftable((cn,id)) = ftable((cn',id') 
for two distinct pairs (cn,id), (en 1 , id'), that is the object heap(m(a)) is not tree- 
shaped. 

In the next definition, we show how a state becomes representable as term over J-. 

Definition 8.3. Let s = (heap, frms,iu) be a state and let the index sets Stk and Loc 
be defined as above. Suppose v is a value. Then the value v is translated as follows: 



tval(u) := 



null if v £ {unit, null} 

v if v is a non-address value, except unit or null 

taddr(v) if v is an address . 



Let a be an address. Then a is translated as follows: 



taddr(a) := 



x if a is special (cf. Definition 18. 2\) and x 

is a fresh variable 
x if heap(a) denotes an abstract variable x 

cn (tva I (ui), . . . , tval(u n )) if heap(a) = (cn, f table) . 



Here we suppose in the last case that dom(/ 'table) = {(cni,id\), . . . , (cn n ,id n )} and for 
all 1 ^ i ^ n: ftable((crii,idi) = Vi. Finally, to translate the state s into a term, it 
suffices to translate the values of the registers and the operand stacks of all frames in the 
list frms. Let (stk,i,j) £ Stk such that stki(j) denotes the j th value in the operation 
stack of the i th frame in frms. Similarly for (loc,i',j') 6 Loc. Then we set 

ts(s) = [tval(stfci(l)), . . . ,tval(s^ fc (|st/c fc |))),tval(/oci(l)), . . . ,tva\(loc k (\loc k \))] , 

where the list [...], is formalised by an auxiliary binary symbol :: and the constant nil. 

Example 8.1 (continued from Example 17. 2\i . Consider the simplified presentation of 
state C in Figure [TOl Then ts(C) yields following term: 

[list, null, L\st(list),list, List(List(Z«st))] . 

Note that we can omit the information of the defining classes of the fields, since this 
is already captured in the symbolic evaluation. Furthermore, observe that our term 
representation can only fully represent tree-shaped data structures. In this sense, the 
term representation of a state s is less general, than its graph-based representation. 
However, we still obtain the following lemma. 

Lemma 8.1. Let s and t be states. IftQs, then there exists a substitution o such that 
ts(t) = ts(s)o\ 
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Proof. Let S and T be the state graphs of s and t, respectively. By assumption there 
exists a morphism m: S — > T. The lemma is a direct consequence of the following two 
observations: 

• Consider the terms ts(s) and ts(i). By definition these terms encode the standard 
term representations of the graphs S and T. 

• Let u and v be nodes in S and T such that m(u) = v. The label of u (in 5) can 
only be distinct from the label of v (in T), if Lg(u) is an abstract variable or null. 
In the former case tva\(Ls(u)) is again a variable and the latter case implies that 
Lt(v) = unit. Thus in both cases, tva I (Lg (it)) matches tval(Lr(t')). 

□ 

The next lemma relates the size of a state to its term representation and vice versa. 

Lemma 8.2. Let s = (heap, frms,iu) be state such that heap does not admit special 
references. Then |ts(s)| = ||s||. 

Proof. As a consequence of Definition 16.31 and the above proposed variant of the term 
complexity we obtain |ts(s)| ^ ||s|| for all states s. For the other direction observe that 
in the absence of special references no information is lost in the term representation and 
thus ||s|| < |ts(s)|. □ 

Definition 8.4. Let s = (heap, frms, iu) be a state reachable from / and let p, q 
denote distinct addresses in heap. Suppose the current instruction in s is a Putf ield 
instruction that alters address p and heap(q) is an abstract variable for some class. 
Furthermore, suppose there is a JVM state t = (heap' , frms') reachable in P from start 
and a morphism m: s — )■ t. Then we say that p and q are joinable (denoted as p N q) if 
m(p) — a ^— m(q) for some address a G dom(heap'). 

Let G be a computation graph. For any state s in G we introduce a new function 
symbol f s . Suppose ts(s) = [s\, . . . ,s n ]. To ease presentation we write f s (ts(s)) instead 
of f s (si,...,s n ). 

Definition 8.5. Let G be a computation graph and let s and t be a states in G. We define 
the constrained rule corresponding to the edge (s,t) (denoted as rule(s,i)) as follows: 



rule(s, t) 



f 8 (ts(a)) f*(ts(a)) if s C t (cf. Definition Eg) 

f s (ts(t)) —¥ fi(ts(i)) if t is a class instance or unsharing refine- 

ment of s (cf . Definitions 17.21 and I7.3P 
f s (ts(s)) — > ft(ts(i)) [tval(C)]] the edge between s and t is labelled by C 

f s (ts(s)) — > ft(ts*(t)) s corresponds to a Putf ield instruction on 

the address p and there exists an address q 
in s, such that p t< q (cf. Definition 18. 4p 
f s (ts(s)) -> ft(ts(t)) otherwise 

Here tval(C) denotes the standard extension of the mapping tval to labels of edges and 
ts* is defined as ts but employs fresh variables for any reference q. 
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Example 8.2 (continued from Example I6.2p . Consider Figure Q] and Figure [10] from 
Example 16,21 We use following conventions: List variables are denoted by I, followed by 
a number. The function symbols contain a state from the computation graph as well 
as the corresponding program position from the bytecode. The translation results into 
following cTRS rules: 

£ 

Lemma 8.3. Let s and t be states in G connected by an edge s — > t from s to t. Suppose 
s' is a JVM state with s' C s. Suppose further that if the constraint I labelling the edge 
is non-empty, then s' satisfies i. Moreover, if s — > t follows due to a refinement step, 
then s' is consistent with the chosen refinement. Then there exists a JVM state t' C t, 
such that f s (ts(s')) ->mle(s,t) f f( ts (*0)- 

Proof. The proof proceeds by case analysis on the edge s — > t in G, where we only need 
to consider the following four cases. The argument for the omitted fifth case is very 
similar to the third case. 

• Case s A t, as s C t; t = 0. By assumption s' C s C t. Hence s' Q t by 
transitivity of the instance relation. By Lemma 18.11 there exists a substitution a, 
such that ts(s') = ts(s)er. In sum, we obtain: 

f s (ts( S ')) = f.(ts(s))<7 -> ru |e( s ,t) ft(ts(«))(T = f t (t S (t')) , 

where we set t' := s'. 

• Case s A- t, as t is a refinement of s; £ = 0. By assumption s' C s and s' is 
concrete. Hence s' C t by definition of t. Again by Lemma 18.11 there exists a 
substitution a, such that ts(s') = ts(i)er. In sum, we obtain: 

f s (ts( S ')) = f s (ts(i))<7 -^ ru |e( 8 , t ) ft(ts(t))ff = f t (ts(i')) , 
where we again set i' := s'. 

• Case s — > t, as t is the result of the symbolic evaluation of s and I = C ^ . By 
assumption s' satisfies the constraint C. More precisely, there exists a substitution 
a, such that ts(s') = ts(s)a and T h tval(C)o\ We obtain: 

f s (ts( S ')) = f S (ts(s))<7 -^ m |e(., t ) ft(ts(t))<7 . 

Let t' be defined such that P: s' -—ti t' . By Lemma 17.21 we obtain t' C t and 
by inspection of the proof of Lemma 17.21 we observe that ts(t') = ts(t)o~. In sum 
f s (ts(s')) -^ ru le( s ,t) ft(ts(f ))■ 

• Case s — > t, as t is the result of a Putf ield instruction on p and there exists an 
address q in s with p M q. By assumption s' C s and thus ts(s') = ts(s)<7 for some 
substitution cr. Let i' be defined such that P: s' - — >i t'. Due to Lemma 17.21 we 
have t' Ct and thus there exists a substitution r such that ts(i') = ts*(t)r. 

Consider the rule f s (ts(s)) — > ft(ts*(i)). By definition the address q points in s to an 
abstract variable x such that x occurs in ts(s) and ts(t). Furthermore x is replaced 
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f/oo([null, null, List(ZO), Zl, null 
f/oi([List(/0), null, List(Z0),Zl, null 
f/ 2([null, null, List(iO), 11, List(ZO) 
f/o 3 ([null, null, List(ZO), 11, List(ZO) 
fA04([null, null, List(Z0),Zl, List(ZO) 
f S0 4([null, null, List(iO), Zl, List(Z2) 
fsos([List(Z2), null, List(ZO), Zl, List(Z2) 
fsoe([Z2, null, List(ZO), Zl, List(Z2) 
f C0 7([List(Z3), null, List(ZO), Zl, List(List(Z3)) 
fcwQnull, null, List(ZO), Zl, List(null) 
f C / 07 ([List(Z3), null, List(ZO), Zl, List(List(Z3)) 
f C / 08 ([true, null, List(ZO), Zl, List(List(Z3)) 
f C / () 9([null, null, List(ZO), Zl, List(List(Z3)) 
fcio([List(List(Z3)), null, List(ZO), Zl, List(List(Z3)) 
f C / n ([List(Z3), null, List(ZO), Zl, List(List(Z3)) 
f C 'i 2 ([null, null, List(ZO), Zl, List(Z3) 
f C / 13 ([null, null, List(ZO), Zl, List(Z3) 
f C / 14 ([null, null, List(ZO), Zl, List(Z3) 
fi304([null, null, List(ZO), Zl, List(Z3) 
f c »07([null, null, List(ZO), Zl, List(null) 
fc»os([false, null, List(ZO), Zl, List(null) 
f C // 15 ([null, null, List(ZO), Zl, List(null) 
fciednull, null, List(Z0),Zl, List(null) 
fi5i7([null, null, List(ZO), Zl, List(null) 
f E i 7 ([null, null, List(ZO), Zl, List(null) 
f B / 17 ([null, null, List(ZO), Zl, List(null) 
f B 'i8([List(null), null, List(ZO), Zl, List(null) 
f B <ig([Zl, List(null), List(ZO), Zl, List(null) 
f B / 2 o([null, null, List(ZO), Zl, List(Zl) 
f B // 17 ([null, null, List(List(null)),Zl, List(null) 
f B //i 8 ([List(null), null, List(List(null)), Zl, List(null) 
fs"i9([il, List(null), List(List(null)),Zl, List(null) 
f_B" 20 ([null, null, List(List(Zl)), Zl, List(Zl) 



) -> f/oi([List(ZO), null, List(ZO), Zl, null]) 

) -> f/ 2([null, null, List(ZO), Zl, List(ZO)]) 

) -> f/03([null, null, List(ZO), Zl, List(ZO)]) 

) -> fA04([null, null, List(ZO), Zl, List(ZO)]) 

) ->■ fso4([null, null, List(ZO), Zl, List(ZO)]) 

) -»• f S o B ([List(Z2), null, List(ZO), Zl, List(Z2)]) 

) ->■ fsoe([22, null, List(ZO), Zl, List(Z2)]) 

) -> fco7([Z2, null, List(ZO), Zl, List(Z2)]) 

) -> fco7([List(Z3), null, List(ZO), Zl, List(List(Z3))]) 

) -> fc"07([null, null, List(ZO), Zl, List(null)]) 

) ->■ fc'os([true, null, List(ZO), Zl, List(List(Z3))]) 

) ->■ fc'09([null, null, List(ZO), Zl, List(List(Z3))]) 

) ->■ f C 'io([List(List(Z3)), null, List(ZO), Zl, List(List(Z3))]) 

) ->■ f C / 11 ([List(Z3),null,List(Z0),Zl,List(List(Z3))]) 

) ->■ fc'i 2 ([null, null, List(ZO), Zl, List(Z3)]) 

) -> fc'isdnull, null, List(ZO), Zl, List(Z3)]) 

) -> fci4([null, null, List(ZO), Zl, List(Z3)]) 

) -> f D 04([null, null, List(ZO), Zl, List(Z3)]) 

) -> fso4([null, null, List(ZO), Zl, List(Z3)]) 

) -> fc"os([false, null, List(ZO), Zl, List(null)]) 

) -> fc'isdnull, null, List(ZO), Zl, List(null)]) 

) -> fc"ie([null, null, List(ZO), Zl, List(null)]) 

) -> fsi7([null, null, List(ZO), Zl, List(null)]) 

) -> f B 'i7([null, null, List(ZO), Zl, List(null)]) 

) -> f B »i 7 ([null, null, List(List(null)), Zl, List(null)]) 

) ->■ f B / 8 ([List(null), null, List(ZO), Zl, List(null)]) 

) -> f E 'i<j([ll, List(null), List(ZO), Zl, List(null)]) 

) ff5'2o([null, null, List(ZO), Zl, List(Zl)]) 

) f E '2i([null, null, List(ZO), Zl, List(Zl)]) 

) -> f B ''8([List(null), null, List(List(null)), Zl, List(null)]) 

) -> fi5»i 9 ([Zl, List(null), List(List(null)), Zl, List(null)]) 

) -> f B »2o([null, null, List(List(Zl)), Zl, List(Zl)]) 

) -> fs"2i([null, null, List(List(Zl)),Zl, List(Zl)]) 



Figure 11: The cTRS of append. 

by an extra variable x' in ts*(t). To simplify the presentation, we assume that x' 
is the only extra variable in ts*(t). Let m be a morphism such that m: s — > s'. 
Without loss of generality, we assume that there exists an address a in s' such that 
m(p) a m(q). By definition of Putf ield, m(p) and m{q) exist in t and only 
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the part of the heap reachable from these addresses can differ in s' and t' 

In order to show the admissibility of the rewrite step f s (ts(s')) — > ff(ts(t')) we define 
a substitution p such that ts(s)p = ts(s') and ts* (t)p = ts(t'). We set: 



p(y) ■-- 



t(x) if y = x' 
<r(y) otherwise . 



Then ts(s)p = ts(s') by definition as x' Var(s). On the other hand ts* (t)p = ts(t') 
follows as the definition of p forces the correct instantiation of x' and Lemma 17.21 
implies that a and r coincide on the portion of the heap not changed by the 
Putf ield instruction. 

Finally, we have to verify the size condition on x'. However, by construction we 
have 

\p(x')\ = \t(x)\ ^ |f.(ts( S ))p| + \ft(ts*(t)){x' i y 0}p\ . 

□ 

Let G be the computation graph of P. Further, let 7Z denote the collection of rules 
representing G according to Definition 18.51 

Our definition of the obtained cTRS 1Z is non-constructive, as neither the definition 
of a special address (cf. Definition 18. 2\ nor the definition of joinable references (cf. Def- 
inition 18. 4|) are computable. This does not affect our theoretical results, but calls for a 
suitable approximation in the implementation of the transformation. For that one can ei- 
ther employ a general shape analysis as for example detailed in 31, 2^, 25 1 or employ the 



annotation technique proposed in [27]. We use the later possibility in the implementation 
of our technique. 

The next lemma emphasises that any execution step is represented by at least one and 
at most four rewrite steps with respect to 1Z. 

Lemma 8.4. Let s be a state in G and let s' be a JVM state such that s' Q s. Then 
P: s' - — >i t' implies the existence of a state t € G such that t' C t and f s (ts(s')) ^-)- 
ft(ts(t')). (Here — — > denotes at least one and at most m many rewrite steps in 1Z.) 

Proof. The lemma follows from the proof of Lemma 17.21 and Lemma 18.31 □ 

We arrive at the main result of this paper. 

Theorem 8.1. Let s' andt' be JVM states. Suppose P : s' t' , where s' is reachable in 
P from some initial state start. Then there exists an abstraction s of s' and a derivation 
f s (ts(s')) —t-^ ft(ts(i')) such that t' C t. Furthermore rcjvm(n) ^ rctrs(n) ^ 4 • rcjvm(n). 

Proof. Due to Theorem 17.11 there exists a s such that s' C s. Now, let m denote the 
runtime of the execution P: s' t' . Then by induction on m in conjunction with 
Lemma [8.41 we obtain the existence of a state t such that t' Qt and a derivation D: 

f s (ts( S '))^f { (ts(t')). (1) 
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In particular we have f s (ts(s')) — >^ f^(ts(t')) from which the first part of the theorem 
follows. 

In order to conclude the second part, suppose m denotes the runtime of the execution 
P: start t'. As G is the computation graph of P we obtain start C /. By our 
assumptions on the input data to P, Lemma 18.21 yields \ts(start)\ = \\start\\. Specialis- 
ing (Q} to / and start yields fj(ts(start)) — — > ft(ts(t')). Thus we obtain 

m ^ rctrs(|ts(start)|) = rctrs(||siari||) ^ 4m . 

As m was arbitrary and m ^ rcjvm(|| start ||), the second part of the theorem follows. □ 

Corollary 8.1. The computation graph method, that is the transformation from a given 
JBC program P to a cTRS 1Z is non-termination preserving. 

Proof. Reasoning similar as in the proof of Corollary 17.11 we obtain that the proposed 
transformation from P to 1Z is non-termination preserving. □ 

9. Implementation 

A prototype of the proposed method has been implemented in the Haskell programming 
language. 

Motivated by abstract interpretation we allow different instantiations of abstract do- 
mains. In particular we allow instances for IntDomain and Memory Model: 

Integer operations are usually performed locally, ie., independent from other compo- 
nents of the state. To define an instance for an abstract integer domain, arithmetic 
operations, a widening operator and an instance check have to be defined. One can eas- 
ily provide different domains, such as the domain of intervals or the domain of signs. A 
memory model allows different representations of the heap abstraction. For example a 
sharing domain as presented here, or a distinctness domain as described initially by Otto 
et al. [8j. The choice of a memory model has a significant impact on state representa- 
tion and state operations. Therefore it is necessary to provide all functions that access 
or modifies the dynamic part of the memory. States itself are represented in a natural 
way using algebraic data types. Most operations such as the unification algorithm or 
the widening operation are reduced to a map operation over the state. Though we usu- 
ally have to memorise visited nodes in the heap to circumvent looping, or corresponding 
addresses (cf. Definition 16.61 ) of two different states. 

The construction of the computation graph itself is independent from the chosen do- 
mains and is a variant of the worklist algorithm. Currently different paths are processed 
sequently, which avoids the need of synchronisation when merging multiple states. The 
here proposed method does not explicitly perform (non-)cyclicity analysis as needed for 
the translation into the rewriting system. To infer a bound for our motivating example 
we introduce additional annotations in the spirit of Q for our implementation. TqY is 
able to infer a linear bound from the resulting rewriting system. 
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10. Conclusion and Future Work 



In this paper we define a representation of JBC executions as computation graphs from 
which we obtain a representation of JBC executions as constrained rewrite systems. We 
precise the widening of abstract states so that the representation of JBC executions is 
provably finite. Furthermore, we show that the resulting transformation is complexity 
preserving by a linear factor. 

As emphasised above our approach does not directly give rise to an automatable 
complexity-preserving transformation, but for that requires an extension by an external 
shape analysis. This may appear as a deficit as in principle we aim at automatable com- 
plexity preserving transformations. However our main result applies to any computable 
approximation of the transformation and in particular it shows complexity preservation 



by a linear factor of the transformation proposed by Otto et al. |27l |. Moreover, it allows 
for an easy incorporation of the existing wealth of results on shape analysis present in 
the literature and thus improves upon the modularity of the proposed transformational 
approach. To assess the viabilty of our method in practise, we have have implemented 
a suitable approximation as a prototype that can be used as a frontend of T^lQ Unsur- 
prisingly this set-up can handle our simple motivating example, but it is too early to 
attempt a sensible experimental assessment. 

For that, we crucially have to overcome the second and third obstacle mentioned in the 
introduction: a) methods of runtime complexity analysis for cTRSs need to be developed 
and b) compositionality of the analysis is required. Item a) clarifies why we have crafted 
the transformation such that constrained TRSs are obtained rather than integer TRS (as 



in the termination graph approach) . Based on very recent results in [19| on the versatility 
of cTRSs, we expect that it will be relatively easy to establish powerful complexity 
analysis for cTRSs. Furthermore, compositionality of the analysis currently amounts to 
the question, whether it is possible to study individual cycles in the computation graph 
separately. Both these questions will be investigated in the future. 
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Semantics of Jinja Bytecode Instructions 



Load n 
Store n 
Push v 
Pop 

New cn' 

Getf ield fn cn' 
Putf ield fn cn' 
Checkcast cn' 
Invoke mn' n 
Return 
IAdd 

If False i 

CmpEq 

Goto i 

ISub 

CmpNeq 

CmpGeq 

BAnd 

BOr 

BNot 



(heap, (stk,loc,cn,mn,pc) :: frms) 
(heap, (loc(n) :: stk,loc, cn,mn,pc + 1) :: frms) 

(heap, (v :: stk,loc,cn,mn,pc) :: frms) 
(heap, (stk, loc{n t-s- v}, cn, mn, pc + 1) :: frms) 

(heap, (stk,loc, cn,mn,pc) :: frms) 
(heap, (v :: stk, loc, cn, mn,pc + 1) :: frms) 
(heap, (v :: stk, loc, cn,mn,pc) :: frms) 
(heap, (stk,loc,cn,mn,pc+l) :: frms) 

(heap, (stk, loc, cn, mn,pc) :: frms) 
(heap{a i— >■ o&?}, (a :: stk, loc, cn, mn,pc + 1) :: frms) 

(heap, (a :: stk, loc, cn, mn,pc) :: frms) 
(heap, (ftable(cn' , fn) :: stk, loc, cn, mn,pc + 1) :: frms) 

(heap,(v :: a :: stk, loc, cn,mn,pc) :: frms) 
(heap{a M> (cn" , f table')}, (stk, loc, cn,mn,pc + 1) :: frms) 

(heap, (v :: stfc, ioc, cn, mn,pc) :: frms) 
(heap, (v :: stk,loc,cn,mn,pc+ 1) :: frms) 

(heap, (pn-i po a sifc, ^oc, cn, mn,pc) :: frms) 

(heap, frm' :: po - a stk,loc,cn,mn,pc) :: frms) 

(heap, [frm]) (heap, (v :: stft, /oc, cn, mn,pc) :: /rm :: frms) 
(heap, []) (heap, frm' :: frms) 

(heap,(ii :: ii :: stk,loc,cn,mn,pc) :: frms) 
(heap, ((%2 + ii) :: stfc, Zoc, cn, mn,pc + 1) :: frms) 
(heap, (false :: stk,loc,cn,mn,pc) :: frms) 

(heap, (stk, loc, cn, mn, pc + i) :: frms) 
(heap, (true :: stk,loc,cn,mn,pc) :: frms) 



(heap, (stk,loc,cn,mn,pc+ 1) :: frms) 

(heap,(v2 :: «i :: stk, loc, cn,mn,pc) :: frms) 
(heap,((v2 — vi) :: stk, loc, cn,mn,pc + 1) :: frms) 

(heap, (stk, loc, cn, mn,pc) :: frms) 
(heap, (stk, loc, cn, mn,pc + i) :: frms) 

(heap,(ii :: ii :: stk,loc,cn,mn,pc) :: frms) 
(heap, ((%2 — ii) :: stfc, ^oc, cn, mn,pc + 1) :: frms) 

(heap,(v2 :: «i :: stk, loc, cn,mn,pc) :: frms) 
(heap,((v2 / t>i) :: stk, loc, cn,mn,pc + 1) :: frms) 

(heap, (v2 :: wi :: stk,loc,cn,mn,pc) :: frms) 
(heap, ((v2 ^ t>i) :: stk, loc, cn,mn,pc + 1) :: frms) 

(heap, (&2 &i stk,loc,cn,mn,pc) :: frms) 
(heap, ((62 A bi) :: stfc, ioc, cn, mn,pc + 1) :: frms) 

(heap, (&2 &i stk,loc,cn,mn,pc) :: frms) 
(heap, ((62 V bi) :: stfc, ioc, cn, mn,pc + 1) :: frms) 

(heap, (b :: stk,loc,cn,mn,pc) :: frms) 
(heap, ((^b) :: stfc, Zoc, cn, mn,pc + 1) :: frms) 
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